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ABSTRACT 

This  report  describes  the  use  of  a  computer  Drogram  that  converts 
a  grammar's  production  rules  from  extended  Backus-Naur-Form  to  another 
equivalent  set  of  production  rules  in  ordinary  Backus-Naur-Form  suitable 
for  use  with  the  Yet  Another  Compiler-Compiler  (YACC)  system.  This  permits 
the  language  designer  to  use  the  far  less  bulky  EBNF  formats,  and  then  to 
automatically  convert  to  BNF  for  use  with  YACC.  A  PDP-11  computer  system 
running  the  UNIX  operating  system  is  assumed. 
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T   Introduction 

This  report  describes  the  use  of  a  computer  program  that 
converts  grammar  Production  rules  in  an  Extended  Backus-Maur-porm 
(EBMF)  into  ordinary  Backus-Maur-^orm  (BMP).  EBMF  is  very  con- 
venient for  a  human  description  of  a  grammar  but  is  not  in  a  for- 
mat acceptable  to  the  Yet  Mother  Compiler-Compiler  (Y*\CC)  system 
[John75] .  YACC  requires  the  far  more  bulky  format  of  ordinary  3MF 
which  is  inconvenient  for  human  use.  The  proqram  whose  use  is 
described  here  is  itself  a  translater  written  for  the  Y^CC  sys- 
tem; the  BMF  it  produces  can  be  used  for  the  input  to  YACC  to 
yield  a  parse  table  and  other  processinq  for  the  original  EBMF 
grammar . 

The  SBNF  to  RVF  converter  oroqram  is  stored  in  the  Naval 
Postqraduate  School  Computer  Sciences  Laboratory  under  the  name 
"ebnf tobnf " .  It  is  intended  to  work  on  a  DPD-ll  under  the  'JMTX 
operating  system.  This  technical  reoort  may  be  accessed  on  the 
'JMIX  system  by  typing  "man  ebnf  tobnf". 

II  The  EBNF  Syntax 


The  EBMF  syntax  acceptable  as  input  to  the  converter  is 
presented  in  this  section.  \n    example  grammar  is  also  presented. 

EBMF  makes  use  of  grammar  production  rules  consisting  of  ter- 
minals, nonterminals,  and  a  replacement  operator.  In  the  discus- 
sion that  follows  we  assume  that  terminal  tokens  are  in  uppercase 
letters  or  strings  of  letters  or  are  enclosed  in  single  quotes. 
The  latter  is  usuallv  reserved   for   trivial   terminals   such   as 
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oarentheses,  semicolons,  etc.  Monterninals  are  lowercase  letters 
or  strings  of  letters.  The  head  symbol  is  the  nonterminal  "z"  as 
is  the  convention  in  some  textbooks.  The  reolacement  ooerator  is 
the  left  arrow,  written  as  < — . 

Two  sets  of  metasymbols  in  FBN^  must  be  removed  from  the 
grammar  (by  modifying  the  production  rules)  to  produce  an 
equivalent  RNF  grammar.  These  are  the  square  brackets  [...]  mean- 
ing "zero  or  one",  and  curly  brackets  {...}  meaning  "zero  or 
more",  ^s  is  usual  in  production  rules  the  vertical  bar  I  means 
"or". 

Consider  the  following  example  in  FBNF: 

z  < —  r a]  C 
Tn  BM^  two  oroduction  rules  are  needed  to  exoress   an   equivalent 
grammar : 

z  < —  C  I  A  C 
or 

z  < —  a'  C 

a'  < —  null  I  A 
Tn  either  case  the  grammar  accepts  only  the  strings  "C"  or  "AC". 

Consider  the  use  of  the  curly  brackets  to  mean  "zero  or 
more"  : 

z  < —  A  (A} 
This  produces  all  the  strings  of  the  form  A,  A*.,  AAA,   AAAA,   anf 
so  the  BMF  equivalent  must  be: 

z  < —  z  A  I  A 
or 

z  < —  z  A 
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Z  < —  A 

The  advantage  of  using  EBMF  to  describe  a  grammar  is  obvious 
from  these  examples;  it  is  unfortunate  that  YACC  will  not  accept 
a  grammar  in  this  form.  In  the  next  section  the  exact  format  of 
the  EBMF  productions  required  for  orocessing  by  YACC  is 
presented . 

Ill  Use  of  the  Converter  Program 


In  this  section  a  simole  EBMF  grammar  is  modified  to  the  for- 
mat acceptable  to  YACC,  and  the  grammar  converted  to  BMF  by  the 
translator  program. 

As  an  examole  grammer  consider  the  following  production 
ru  les : 

z  <—  fb}  ; 
b  <—  [C]  [A]  D 
Here  z  and  b  are  nonterminals  and  A,    C,  0,  and  ;   are   terminals. 
How  might  these  productions  by  modified  to  a  format  acceptable  to 
the  translator  program? 

Several  symbols  must  be  replaced  in  the  EBMF  used  above  to 
make  productions  acceptable  to  YACC.  First,  the  replacement 
operator  must  be  a  colon  (:)  instead  of  a  left  arrow  (< — ). 
Secondly,  all  trivial  terminals  (ie.  parentheses,  semicolons, 
etc.)  must  be  enclosed  in  single  quotes  (').  Thirdly,  all  other 
nonterminals  must  be  explicitly  indicated  to  YACC.  Finally,  the 
head  symbol  production  rule  must  be  the  first  (top)  rule. 

The  above  examole  production  rules  are  manually  converted  to 
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yield    to    following: 

Hoken    ^    C    D 

o  5 

z    :     {b}     ';'     ; 

b  :  [CI  [A]  D  ; 
t^s  many  of  the  %token  statements  as  needed  can  be  used. 

Mow  consider  the  execution  of  the  F^MF   to   8MF   translator. 
Since  it  is  also  a  Y^.CC  program  inout  it  first  must  be  executed: 

yacc  ebnftobnf 
This  produces  a  file  in  your  file  space  named    "y.tab.c".    The 
next   step   is  to  execute  the  C  program  in  file  "y.tab.c"  by  typ- 
ing : 

cc  y . tah. c  -ly 
This  produces  a  file  named  "a. out"  that   can   actually   translate 
EBNF  to  BMF  by  the  following  command: 

a. out  <ebnffile  >bnffile 
where  "ebnffile"  is  the  EBNF  inout   file   requiring   translation; 
the  ordinary  ^MF  equivalent  will  result  in  file  "bnffile".  Choose 
whatever  names  you  like  for  these  files.  The  appendix   shows   the 
example  presented  above  before  and  after  translation. 


TV   Using  the  3MF  Equivalent 


In  this  section  the  use  of  the  BMF   equivalent   as   input   tp 
another  Y^CC  orocess  is  described. 

The  whole  purpose  of  the  EBMF  to  3MF  conversion  process   was 
to  produce  a  set  pf  orpductipn  rules  acceptable  tp  YJ\CC,  an^  thus 
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be  able  to  build  a  "compiler"  that  can  process  a  "program"  in  the 
grammar  to  oro^uce  either  a  "yes"  or  "no"  answer  as  to  the 
orogram's  syntax  correctness  or  to  compile  it  to  some  other  tar- 
get language.  To  accomplish  this  the  equivalent  BNF  grammar  must 
be  embedded  among  other  statements  that  indicate  the  terminal 
to'<ens  and  a  C  program  (possibly  making  use  o?    LF.X    [Leskl). 

To  do  this  you  must  produce  the  same  list  of  terminals  used 

in   the   conversion  process  (%token  ,  %%)  ,    and  preoend  it  to 

the  "bnffile".  One  VF3Y  IMPORTV>lT  production  rule  modification 
must  be  accomplished  prior  to  resubmitting  the  "bnffile".  The 
conversion  orocess  typically  revises  the  order  of  the  production 
rules  due  to  the  inclusion  of  new  rules  with  new  nonterminals.  Be 
sure  to  insert  the  original  head  svmbol  production  rule  back  at 
the  very  top  of  the  list  of  rules;  YACC  requires  this  if  a 
correct  parse  table  is  to  result.  It  mav  have  been  moved  down  the 
list  if  it  had  square  or  curly  brackets  in  its  right  hand  side. 
Finally  aooend  any  C  program  for  processing  the  grammar  into  a 
target  language  to  the  list  of  production  rules;  separate  them  by 
a  %%  delimiter  line.  See  the  YACC  manual  for  details. 

VIT   Conclusion 


This  report  describes  how  to  convert  a  EBNF  grammar  to  BMF 
suitable  to  YACC.  While  the  program  has  been  tested  and  found  to 
work  satisfactorily  the  usual  disclaimer  as  to  correctness  must 
be  ma^e.  The  conversion  orocess  yields  new  production  rules  with 
new  nonterminals.  These   new   nonterminals   are   formed   by   con- 
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catenatinq  the  original  nonterminals  with  prefixes  such  as  "fat." 
and  "oot.",  and  the  results  for  a  complicated  qra.nar  can  get 
quite  long.  Use  the  editor  to  shorten  then  ud  if  desired,  but 
preserve  the  uniqueness  of  each  nonterminal.  Some  nonterminals 
may  contain  sequences  such  as  "._.";  these  are  acceptable  to  **CC 
and  so  may  be  left  unchanged. 
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[John75]  Johnson,  Stenhen  C.  ,  "YACC  -  Yet  Another  CoTioiler- 
Comoiler",  Bell  Laboratories,  Murry  Hill,  NJ  07974. 


(Leskl  Lesk ,  M.  5.,  and  E.  Schmidt,  "Lex  -  A  Lexical  Analyzer 
Generator",  Bell  Laboratories,  Murrv  Mill,  Ml  07974. 
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A?P"MOIX 


****    The  following  is  an  example  input  Pile  (ebnffile) 


**** 


Stoken  «\  C  D 

%% 

Z  :  (b|  ';  '  ; 

b  :  fCl   [M  D  ; 


****  The  following  is  an  examole  output  file  (bnffile).  **** 

****  Note  that  the  first  two  rules  must  be  interchanged  **** 

****  if  it  is  to  be  used  as  part  of  a  Y^NCC  input  via  ***• 

****  the  a. out  orocess.  **** 

****  Mote  the  null  oroduction:  f s t .b . : null  I f st . b .  b  ;  **** 


fst.b. : 
Z  : 

opt .C. : 
oot . A . : 
b: 


fst.b.  b 
fst.b.  b  '; •   ; 
C   ; 
*  ; 

opt.C.   opt. A.   0 
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****  Following  is  a  listing  of  the  'ebnftobnf"  program  **** 


%token  SYMBOL  LITFR\L 
%{  tdef  ine  M'JLL  0 

struct  node 

{  #char  symbol [ 30] ; 
struct  node  *first; 
struct  node  *next; 

}; 

char  symbol  [301  ; 

struct  node  *on; 
%} 


grammar : 
rule  list: 


rule  list; 


rule 
I  rule  list  rule; 


rule 


nonterm  ':'  al ternat i ve_l ist  ';' 

=  {   printf  ("9s%c0   ",  $l->svmbol,  ':'); 

for  (on  =  $3;  on  !=  MULL;  on  =  pn->next) 
{   pitems  (pn->first); 

if  (pn->next  ==  MULL)  printf  ( "  "); 
else  printf  ("0  I  ") ; 
} 

printf  (";0) ; 
} 


nonterm: 


SYMBOL 


=  {   $$  =  ncreate  (svmbol,  MULL,  MULL); 
} 


a i ternat ive_l ist: 

alternative 


alternative : 


element  list 


=  {   5$  =  ncreate  ("a",  $1,  NULL); 

} 
I  alternat i ve_l ist  '!'  alternative 

=  {   last  ($l)->next  =  ncreate  ('a",  $3,  MULL); 

} 

=  {   $$  =  ncreate  ( "  ",  NULL,  M'JLL); 

} 
I  element_l i st ; 

element 

I  element_list  element 

=  {   last  CU)->next  =  $2; 
} 


element : 
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SYMBOL 

=  f   $$  =  ncreate  (synbol,  NULL,  NULL); 
} 
I  LITERS 

=  {   $$  =  ncreate  (symbol,  MULL,  NULL); 
} 
I  '  r '  element  list  '1  ' 

=  {   <S$  =  ncreate  Co",  $2,  NULL); 
if  (Hookup  ($$)) 
{   print f  ("0); 
pi  tem  ($"5)  ; 
pr intf  ("%c0  " ,  ' : ' ) ; 
pi  tems  ($2)  ; 
or intf  ("  ;  ")  ; 
} 
} 
'  {  '  elenent_list  ' } ' 

=  {   $<5  =  ncreate  ("1",  $2,  NULL); 
if  (Hookup  ($$)) 
{   printf  ("0); 
pi  ten  ($$) ; 
printf  ("%c0  "  ,  '  :  ')  ; 
pi  ten  ($$)  ; 
printf  ( "  "  )  ; 
pi  tens  ($2) ; 
prints  ('•;"); 


) 


) 


^define  LETTER  'a' 
^define  DIOIT  '0' 


yyiex  () 
{ 


int  i ,  t ,  qetch  () ; 
char  c; 


ii 


)  ; 


while  ((c  =  getchO)  ==  '  '  !  I  c  ==  '  0  ]  !  c  == 
if  (type  (c)  ==  LETTER) 
{   i  =  0; 
symbol [ i++l  =  c; 

while  ((t  =  type  (c  =  svnbolfi++l  =  getchO)) 

|  |  t  ==  OT1TT  I  !  c  ==  '_'  ||  c  ==  '  .  ')  ; 
ungetch (c) ; 

symbol [ — i 1  =  '  ' ; 
return  (SYMBOL) ; 
} 
else  if  (c  ==  ' ' ' ) 
{   i  =  0; 

symbol [ i++1  =  c; 

while  (  (c  =  symbol[i++]  =  getchO)  !-  '' 

symbol [ i 1  =  '  ' ; 

return  (LITERS)  ; 


')  ; 
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} 

tyoe  (c) 
char  c; 
{ 


} 

else  return  (c)  ; 


} 


if  (c  >  =  'a'  &&  c  <=  'z'  lie  >=  'A'  &&  c  <= 
if  (c  >=  '0'  &&  c  <=  '9')  return  (DIGIT) ; 
return  (c)  ; 


•Z')  return  (LETTER) 


ncreate  (string,  first,  next) 

char  *string; 

struct  node  *first,  *next; 

{ 

struct  node  *o; 

o  =  alloc  (40) ; 

strcoy  (o-^symbol ,  string); 

p-> first  =  first; 

o->next  =  next; 

return  (o) ; 
} 

last  (no) 
struct  node  *nn; 

{ 

struct  node  *o; 

for  (o  =  no;  p-^next  !=  NULL;  p  =  p->next) ; 

return  (o) ; 
} 

strcpy  (sf  t) 
char  *s,  *t; 
{ 

while  (*s++  =  *t++) ; 
} 

pitenis    (no) 
struct    node    *no; 

{ 

struct    node    *p; 

for  (o  =  no;  p  !=  NULL;  o  =  o->next) 
{  oi  tern  (o)  ; 

o  r  i  n  t  f  ( "   " )  ; 
) 
I 


oiteti  (no) 
struct  node  *no; 
{ 

if  (nD->first  ==  M'JLL)  printf  ("%s",  no->symbol ) ; 

else 

{  if  (stremp  (np-^>  symbol ,  "o")  ==  0)  printf  ('"opt"); 
el'se  orintf  ("fst"); 
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} 

plist 
struct 
{ 


} 


olist  (no->first) ; 


(no) 
node  * 


no; 


while  (no  !=  MULL) 

{   if  (no->first  ==  M'JLL) 

if  (*  (m-Vgvnho  L)  ==  '  '')  printf  ( 
else  printf  (".%s",  no->synbol); 
else  if  (strcmp  (no->symbol ,  "o")  ==  0) 
(   or  int f  ( " . .opt  "  ) ; 

olist  (no->f i  rst)  ; 
I 

else 

{  p  r  i  n  t  f  (  "  .  .  1  s  t "  )  ; 
olist  (no->first); 
} 
no  =  np->next; 


")  ; 


1 


or  intf  ( 


); 


strcno  (s,  t) 
char  s  f 1  ,  t  fl ; 
{ 

int  i  ; 

i  =  0; 

while  (sTi]  ==  t[il ) 

if  (s[i++l  ==  '  ')  return  (0); 

return  ( s  T  i.  1  -  t  [  i  1  )  ; 
} 

char  buf [ 1 1  ; 
int  bufp  0  ; 
getch  () 
{ 

return  (  (bufo  ==  0)  ?  qetcharO  :  buff — bufol)  ; 
} 

unqetch  (c) 
int  c ; 
{ 

buffbufn++l     =    c; 
} 

■^define    TRUf;    1 

define    ^LSS    0 

struct  node  *newnonterm  [1001; 

int  no new  0; 

lookuo  (no) 

struct  node  *no; 

{ 


int  i  ; 
for  (i 


=  0;  i  <  nonew;  i++) 
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if  (equal  (no,  newnontern f il ) ) 
return  (TR'JF)  ; 
newnonterm fnonew++l  =  no; 
return  (FALSE); 


equal  (x,  y) 
struct  node  *x,  *y; 
{ 


} 


if  (strcmp  (x->symbol ,  y->symbol)  !=  0)  return  (FALSE); 
else  return  (eqlist  (x->first,  y->first)); 


eql ist  (x  ,  y) 
struct  node  *x,  *v; 
{ 


} 


while  (x  !=  ^'JLL  £&  y  !=  M'JLL) 

{   if  (leqtyoe  (x,  y) )  return  (FALSF); 

if  (strcmp(  x->symbol,  y->symbol)  !=  0)  return  (FALSE); 

if  (x->first  !=  NMLL) 

i?    (Jeqlist  (x->first,  y->first))  return  (FALSE); 

x  =  x->next; 

y  =  y->next; 
} 

if  (x  !=  y)  return  (FALSE); 
else  return  (TRUE)  ; 


eqtyoe 
struct 
{ 


(x,  y) 
node  *x 


y 


if  (x->first  ==  M'JLL)  return  (v->first  ==  NULL); 

if  (y->first  ==  M'JLL)  return  (FALSE); 

if  (* (x->symbol)  ==  'o')  return  (* (y->symbol )  ==  'o'); 

return  (*  (y->syinboi)  ==  '!'); 
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